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Abstract 

Mobile genetic elements such as DNA transposons are a feature of most genomes. The existence of novel DNA 
transposons can be inferred when whole genome sequencing reveals the presence of hallmarks of mobile elements 
such as terminal inverted repeats (TIRs) flanked by target site duplications (TSDs). A recent report describes a new 
superfamily of DNA transposons in the genomes of a few bacteria and archaea that possess TIRs and TSDs, and encode 
several conserved genes including a casl endonuclease gene, previously associated only with CRISPR-Cas adaptive 
immune systems. The data strongly suggests that these elements, designated 'casposons', are likely to be bona fide 
DNA transposons and that their Casl nucleases act as transposases and are possibly still active. 
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Background 

Mobile genetic elements can modify the genomes of the 
organisms that harbor them, and their mobility is be- 
lieved to be an important factor in evolution (reviewed 
in [1-5]). Mobile elements can affect their host by disrupt- 
ing genes, modifying control regions, and by introducing 
new proteins or protein domains into novel genomic loca- 
tions. One of the best known examples is the RAGl pro- 
tein of jawed vertebrates which is a key protein required 
for the functioning of the adaptive immune system [6], 
and whose catalytic domain originated from the transpo- 
sase associated with Transit transposons [7]. 

One of the most exciting recent advances in micro- 
biology has been the discovery that an adaptive immune 
system also exists in many bacteria and archaea (reviewed 
in [8-11]). CRISPR-Cas systems provide a mechanism for 
prokaryotes to incorporate short stretches of foreign DNA 
(spacers') into their genomes to archive sequence infor- 
mation on 'non-self DNA they have encountered, such 
as that of viruses or plasmids. This is called the adaptation 
stage of the immune process. Once integrated, these 
spacers serve as templates for the synthesis of RNA which 
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then directs Cas nucleases to specific foreign nucleic 
acids in order to degrade them. Several different types 
of CRISPR systems have been identified, and each is 
associated with a distinct set of Cas proteins. Only two 
proteins, Casl and Cas2, appear to be strictly conserved 
among the various CRISPR systems, and they are both 
metal-dependent nucleases. The structure of the Casl- 
Cas2 complex from E, coli strain MG1655 has been 
determined [12]. 

A recent report by Krupovic et al [13] presents data 
suggesting that Casl proteins of CRISPR systems origi- 
nated from a newly identified superfamily of DNA trans- 
posons that the authors call casposons'. If true, an elegant 
symmetry emerges in the evolutionary history of the estab- 
lishment of adaptive immune systems in higher eukaryotes 
and in bacteria and archaea. Furthermore, the discovery of 
a novel family of DNA transposases would be a significant 
addition to the known repertoire of mechanisms by which 
mobile elements are moved [14]. 

Main text 

The work of Krupovic et al, builds on a previous report 
on the evolutionary history of Casl proteins which iden- 
tified two groups of Casl proteins not associated with 
CRISPR loci [9]. One of these groups, designated the 
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Casl-solo group 2, has Casl genes in a conserved neigh- 
borhood that usually also contains genes for a B family 
DNA polymerase, an HNH nuclease, and several helix- 
turn-helix (HTH) domains (Figure lA). The current 
analysis reveals that this conserved region is contained 
between terminal inverted repeats (TIRs) and is flanked 
by target site duplications (TSDs), hallmarks of DNA 
transposons encoding RNase H-like transposases (reviewed 
in [15,16]). Krupovic et al propose that these features sug- 
gest that these regions are mobile genetics elements, and 
that the Casl proteins are required for the integration step 
of transposition. They further propose that the location of 
this group of proteins within the Casl phylogeny indicates 



that they likely predate the development of CRISPR-Cas 
systems. 

The parallels between the proposed mechanism of 
the adaptation step of the CRISPR immune system 
(reviewed in [17]) and DNA transposition are striking. 
Cas proteins are responsible for excising a short spacer 
segment from foreign DNA (typically 32 to 38 bp [11], 
preceded by a 2 to 5 bp protospacer adjacent motif, or 
PAM) and site- specifically integrating it into a particular 
genomic location at the leader end of a CRISPR locus. 
Spacer integration is accompanied by the generation of 
direct repeats on either side of the spacer that can vary 
in size from 23 to 55 bp [11]. Thus, if the Casl nucleases 
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Figure 1 Properties of tiie family 2 casposons. (A) Predicted common protein-coding genes witiiin family 2 casposons include a PolB family 
polymerase, an HNH family endonuclease, several HTH domains, and Casl. The gene color code corresponds to that of Krupovic et al. The green 
arrows flanking the casposons indicate target site duplications (TSDs). (B) An alignment of the first 41 nucleotides (nt) of casposon family 2 Left 
End Terminal Inverted Repeats (TIRs) reveals conserved sequence motifs which could be the basis of transposase recognition. Green letters indicate 
the TSDs and black letters the TIR sequences identified by Krupovic et al., with apparently conserved patterns highlighted in red or blue. Bold 
black lettering corresponds to nts that were not included in the analysis of Krupovic et al. The aligned sequences and the Accession Number 
and coordinates for each are: MetFor-Cl [NC_019943;1 9641 05..1 9641 59], MetPsy-Cl [NC_018876;1 90336.. 190390], MetTin-Cl [NZ_AZAJ01 000001 ; 
301 5399..301 5453], MetMaz-Cl [NC_003901; 3946587..3946641], MetMah-Cl [NC_0 14002; reverse complement of 1332841 ..1332895], MetLum-Cl 
[NZ_CAJE01000015; 1 5 9864.. 1 5991 8] AciBoo-Cl [NC_013926; 380309..380363], MetArv-Cl [NC_009464; 2695204..2695258]. 
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associated with casposons are involved in catalyzing 
transposition, they presumably can sequence-specifically 
recognize their TIRs which for most DNA transposons 
are longer than 10 bp [2,15]. They also appear to exhibit 
relaxed target DNA recognition properties relative to 
CRISPR-Cas systems: whereas spacer integration medi- 
ated by Cas proteins is site-specific, the genomic loca- 
tions of casposons suggests that their integration sites 
are not highly conserved (in line with the integration 
properties of most RNase H-like DNA transposons with 
a few notable exceptions, such as the bacterial Tn7 
transposon [18]). 

One of the main ways that transposon superfamilies 
are grouped is by the conservation of TIR sequences lo- 
cated at their transposon ends. At first glance, the 19 
putative casposon TIR sequences identified and analyzed 
by Krupovic et al appear disconcertingly variable both 
in length and in sequence. However, we find that it is 
possible to align the TIRs of the sequences correspond- 
ing to casposon family 2 members (the most populous 
casposon family defined in Krupovic et al) such that a 
pattern of conserved base pairs emerges within the ter- 
minal approximately 20 bp (Figure IB). This suggests 
that transposon-specific end recognition by a casposon- 
encoded protein is reasonable. (Casposon families 1 and 
3 TIRs can also be aligned to reveal conserved TIR 
motifs but have fewer representatives than family 2.) 

The alignment in Figure 1 also suggests a resolution of 
a second unusual feature of the sequences presented by 
Krupovic et al, which is that the TSDs are reported to 
vary in size from 1 to 27 nucleotides (nt). TSD size is 
typically highly conserved in Insertion Sequences and 
DNA transposon superfamilies, rarely varying by more 
than one or two nt [15,2]. This is because TSD size is a 
direct consequence of the spacing of the staggered cuts 
generated by a transpososome assembled on target DNA, 
and it reflects properties of the distinct architecture - in 
particular the distance between and the orientation of two 
catalytic sites - of these multimeric protein-DNA com- 
plexes. When the TIRs of casposon family 2 are aligned as 
in Figure IB, the TSD size (as they are usually defined 
which does not include any overlap with the TIRs) now 
converges on 14 bp. This is relatively large when com- 
pared to TSDs of most characterized transposons, but is 
substantially less than the range of 23 to 55 nt for the 
repeat size of CRISPR systems. The thus-aligned TSD 
sequences also hint at yet another feature of many char- 
acterized DNA transposons which is a preferred palin- 
dromic target site motif [19]. 

Finally, it should be noted that all of the casposon- 
associated Casl proteins identified by Krupovic et al 
possess the four conserved catalytic residues expected 
for an active Casl nuclease (Supplemental Figure 1 in 
their report). 



Conclusions 

The evidence is compelling that casposons possess some 
of the expected properties of active DNA transposons. 
However, as we are only beginning to understand how 
the multiple Cas proteins in different CRISPR systems me- 
diate immunity, the evolutionary link between the CRISPR- 
associated Casl proteins and the casposon-associated Casl 
proteins provides only limited insight into the possible 
mechanism of casposon mobility. Many intriguing ques- 
tions have been raised by the report of Krupovic et al Since 
two types of nuclease are often associated with casposons, 
the Casl proteins and usually an HNH nuclease, does the 
latter have a role? If so, do these nucleases work together 
and interdependently to catalyze excision and integration? 
How might Casl and a B family polymerase collaborate 
to generate the proposed intermediate of the reaction, 
an excised transposon flanked by double-strand breaks? 
How is this related to the transposition mechanism of 
the superfamily of self-synthesizing Polinton/Mavericks 
found in eukaryotes [20,21], to which casposons are pro- 
posed to be mechanistically related albeit not evolutionar- 
ily [13]? Do the recurrent HTH domains identified within 
casposons (for example, all the Casl proteins of casposon 
family 2 have a conserved HTH appended to their C- 
termini) play a role in the recognition of transposon ends 
or a target site? Clearly, experimental biochemistry is 
needed to answer these questions. 
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